基于语义特征的自动文本分类方法

doi:10.3969/j.issn.1006-2475.2010.11.003

计算机与现代化 ›› 2010, Vol. 1 ›› Issue (11): 9-11,1.doi: 10.3969/j.issn.1006-2475.2010.11.003

基于语义特征的自动文本分类方法

胡晓辉,徐也可,刘斌

江西机电职业技术学院信息与管理工程系，江西南昌 330013

收稿日期:2010-06-01 修回日期:1900-01-01 出版日期:2010-11-25 发布日期:2010-11-25

Semantic-based Automatic Text Classification Method

HU Xiao-hui, XU Ye-ke, LIU Bin

Department of Information & Management Engineering, Jiangxi Vocational College of Mechanical & Electrical Technology, Nanchang 330013, China

Received:2010-06-01 Revised:1900-01-01 Online:2010-11-25 Published:2010-11-25

摘要/Abstract

摘要： 自动文本分类是指在给定的分类体系下，让计算机根据文本的内容确定与它相关联的类别。现有的文本分类算法大都基于向量空间模型，因而不能充分表达文档的语义特征信息，从而影响了分类器性能。针对此问题，本文通过训练文档构造相似矩阵,从中获得每个类别的主题信息，由此构造分类器，最后与经典的分类器进行组合以确定文本类别。实验系统证明本文提出的分类方法较大改进了分类器性能。

关键词: 文本分类, 语义特征, 向量空间模型, 图形模型, 算法

Abstract: Automatic text classification is defined as the task to assign pre-defined category labels to documents．Based on the limitations of Vector Space Model, the Vector Space Model is incapable of expressing the structure of documents effectively．To solve this problem，this paper constructs the sireilar matrix by train text, and achieves the subject information of each category through similar matrix, and then to construct the classifier by the subject information．Finally the classifier is combined with the classic classifier to determine the category of text．The experiment system
shows the effectiveness of the method．

Key words: text classification, semantic features, VSM, graphical model, algorithm

中图分类号:

TP301.6

胡晓辉;徐也可;刘斌. 基于语义特征的自动文本分类方法[J]. 计算机与现代化, 2010, 1(11): 9-11,1.

HU Xiao-hui;XU Ye-ke;LIU Bin. Semantic-based Automatic Text Classification Method[J]. Computer and Modernization, 2010, 1(11): 9-11,1.

[1]	王晓航1, 李永杰1, 余雷1, 范萧2. 一种利用复合事件概率运算解决负信息抑制最大化问题的方法[J]. 计算机与现代化, 2024, 0(12): 24-33.
[2]	吕美静1, 年梅1, 张俊1, 2, 付鲁森1. 基于自编码器的网络流量异常检测[J]. 计算机与现代化, 2024, 0(12): 40-44.
[3]	龚谊承1, 2, 刘青1, 2. 基于RF-LCE-BiLSTM-Attention-AMSSA模型的京剧二分类[J]. 计算机与现代化, 2024, 0(11): 7-12.
[4]	袁庆乐, 牟莉. 基于改进Elman神经网络的预测方法[J]. 计算机与现代化, 2024, 0(11): 28-33.
[5]	陈宇航1, 杨勇1, 帕力旦·吐尔逊1, 樊小超1, 任鸽1, 刁宇峰2. 融合句法特征与语义特征的作文自动评分方法[J]. 计算机与现代化, 2024, 0(11): 64-69.
[6]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[7]	张志霞, 秦志毅. 基于变分模态分解和IGJO-SVR的网络舆情预测[J]. 计算机与现代化, 2024, 0(11): 77-83.
[8]	杨正科, 沈小东, 王凯翔, 何立. 基于改进麻雀搜索算法的接地网腐蚀故障定位[J]. 计算机与现代化, 2024, 0(10): 14-20.
[9]	黄杉杉1, 吴巍2, 徐雨晴1, 魏婕1. 基于改进Mask R-CNN和LSD的图纸管道检测方法[J]. 计算机与现代化, 2024, 0(10): 42-48.
[10]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[11]	王佳1, 顾文俊1, 鞠炜刚2, 李玉维1, 张云龙2, 米传民3, 周志鹏3. 基于多元级差优良化遗传算法的环境拓扑结构任务调度[J]. 计算机与现代化, 2024, 0(10): 65-73.
[12]	于天一, 李剑锋, 陈海龙, 翟军. 隐性角色下的协同推荐算法[J]. 计算机与现代化, 2024, 0(09): 1-7.
[13]	张惠楠1, 张强1, 孙红霞2. 基于改进时序胶囊网络的油藏生产动态分析模型[J]. 计算机与现代化, 2024, 0(09): 15-19.
[14]	杨俞沣1, 2, 夏小云2, 陈泽丰3, 廖伟志2, 李积武2. 融合多策略蜣螂优化算法的外卖订单配送路径优化[J]. 计算机与现代化, 2024, 0(09): 25-32.
[15]	许小伟, 程宇, 钱枫, 祝能, 邓明星. 基于AES的车联网通信加密算法[J]. 计算机与现代化, 2024, 0(09): 45-51.

基于语义特征的自动文本分类方法

Semantic-based Automatic Text Classification Method

可视化

被引次数

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价